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Abstract. Neurons in the nervous system convey information to higher brain regions 
by the generation of spike trains. An important question in the field of computational 
neuroscience is how these sensory neurons encode environmental information in a way 
which may be simply analyzed by subsequent systems. Many aspects of the form and 
function of the nervous system have been understood using the concepts of optimal 
population coding. Most studies, however, have neglected the aspect of temporal 
coding. Here we address this shortcoming through a filtering theory of inhomogcneous 
Poisson processes. We derive exact relations for the minimal mean squared error 
of the optimal Bayesian filter and by optimizing the encoder, obtain optimal codes 
for populations of neurons. We also show that a class of non-Markovian, smooth 
stimuli are amenable to the same treatment, and provide results for the filtering and 
prediction error which hold for a general class of stochastic processes. This sets a sound 
mathematical framework for a population coding theory that takes temporal aspects 
into account. It also formalizes a number of studies which discussed temporal aspects 
of coding using time- window paradigms, by stating them in terms of correlation times 
and firing rates. We propose that this kind of analysis allows for a systematic study 
of temporal coding and will bring further insights into the nature of the neural code. 
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1. Introduction 

Populations of neurons transmit information through their joint spiking activity. One 
of the main goals of computational neuroscience is to gain insight into the mechanisms 
which shape the functional activity of neurons, and to better understand and possibly 
decode the information encoded by neurons. In the study of optimal population codes, 
it is usual to start from considerations about the nature of the neural information 
processing and of the tasks performed by the neurons to search for the best possible 
coding strategies (see for example [H Ej |3l H]). Considering the encoding and decoding 
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of stimuli from the information-theoretical viewpoint, we can develop a theory for the 
optimal Bayesian estimator for any given task. Given a particular estimation task and 
a noise model, the Bayesian estimator is the best estimator for the task, minimizing 
a well-defined cost function (see [5]). In the broader context of Systems Theory, spike 
train encoding and decoding can be viewed within the context of optimal filtering based 
on point process observations |6]. 

Here we focus on a specific question: Given a stimulus ensemble and the optimal 
Bayesian decoder, how can we design an encoder to minimize the mean squared error in 
a stimulus reconstruction/state estimation task? That is, we would like to determine an 
optimal response distribution, to make the reconstruction of the stimulus easier. This 
has been the subject of previous investigations in a number of contexts. In [71 [21 E] 
the authors sought to answer the question in the framework of static stimuli. These 
papers established the existence of a finite optimal tuning width for bell-shaped tuning 
functions. In [1] the authors study a similar problem in the context of finite-state Markov 
models. In the Deep Belief Network literature, the study of Autoencoders, which deal 
with a very similar question, has recently received a lot of attention as well [8] . We focus 
here on the framework proposed in [9], where a dynamic stimulus is observed through 
noisy spike trains and is decoded online. This falls within the general theory of Bayesian 
filtering of point processes [6]. So, given an ensemble of stochastic stimuli and a family 
of possible encoders, we seek the encoder that minimizes the mean squared error of the 
Bayesian decoder (filter) in a state estimation task. 

We observe that the task of designing an optimal encoder- decoder pair in our setting 
is very different from the case often studied within Information Theory [10], where 
real-time constraints are absent and solutions are asymptotic in nature, being based 
on infinitely increasing block sizes. In the present real-time setting, we are interested 
in guaranteeing good real-time performance for finite observation times. In fact, the 
problem studied in this work falls into the category of a decentralized multi-agent 
sequential stochastic optimization problem (viewing the encoder and decoder as agents), 
the general solution to which is not known [TT]. The approach taken here is based on 
selecting a decoder using optimal Bayesian decoding assuming a fixed encoder, and then 
optimizing the encoder itself. 

A central aspect of neural coding is its speed. Neural populations typically 
perform computations within less than 50 milliseconds, accounting for the fast responses 
characteristic of animals [121 [13]. Most studies of optimal population coding, however, 
still resort to the paradigm of time-slots, in which a (mostly static) stimulus is presented 
to a network for a given time window, spikes are pooled for each neuron and a rate 
code is assumed [3l H]. Here we follow a different path, focusing on the dynamical 
aspect of natural stimuli, and developing the optimal Bayesian filter for a dynamic state 
estimation problem (this has been hinted at in [9] and developed for finite-state models 
in [1]). In filtering theory, one tries to estimate the value of a certain function of the 
system's state, given noisy observations of that system's state in the past. 

Drawing from the theory of stochastic dynamics, we present a model for the joint 
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dynamics of signal and noise, where the signal is assumed to be a diffusion process and 
the noise arises from the Poisson spiking of the neurons. For the subclass of linear 
stochastic processes, we find that the mean squared error (MSE) is equal to the average 
posterior variance of the filter. This can be shown to be independent of the specific 
signal, and we can analyze the marginal dynamics of the variance of the filter. Analyzing 
this marginal dynamics, we obtain results regarding the value of the mean squared error 
through simulations and in a mean-field approximation, which hold in the equilibrium 
as well as in the relaxation period. In spite of the simplifications involved, the mean- 
field results are shown to be in very good agreement with the the full Monte Carlo 
simulations of the system. 

For the linear stochastic processes considered we obtain an interesting relation 
for the timescales involved. Specifically, we find that whenever the average interspike 
interval is larger than the correlation time of the observed process, the distribution of 
possible errors diverges around the upper bound for the error. This implies that the 
firing rate of the sensory neurons must exceed a threshold in order to be able to track 
the input properly. The threshold is defined by the statistics of the stimulus. This 
relation holds exactly for the class of processes considered, and is specially pronounced 
in the Ornstein-Uhlenbeck process, where we present a closed-form solution of the full 
distribution of the posterior variance. We also provide an exact analysis of the prediction 
error which holds for the general class of processes discussed. Furthermore, we show that 
for the stimuli considered there is a finite optimal tuning width for the tuning functions 
which minimizes the error in the reconstruction of the stimulus. The dependence of this 
optimal tuning width as a function of the stimulus properties is discussed in the context 
of an ecological theory of sensory processing |14j . 

Much effort has been devoted to understand how spiking neural networks can 
implement operations such as marginalization, optimal inference under uncertainty and 
others. The finding that humans and animals combine uncertain cues from separate 
senses near-optimally [151 IIHl [IZ] has given a lot of traction to this line of research. We 
note that this paper takes a different path. We seek the encoder that saturates the limits 
given by the Bayesian optimal decoder. In [18], the authors have considered a similar 
problem, but tried to devise a spiking neural network that would optimally decode the 
stimulus. Similarly, in [19] a procedure based on divisive normalization was presented 
that performs marginalization optimally for similar problems, and was applied to a 
filtering problem similar to the one considered here. We, however, focus on the optimal 
design of an encoder given an ensemble of stimuli assuming optimal decoding and study 
their relation to the statistical structure of the stimulus. This has been studied before 
in different settings (e.g. [H |2l [3l [20]). 

We propose that the framework of Bayesian filtering is more suited to study 
population coding than the usual time-binning method, as it allows for a natural 
inclusion of the time scales of the observed process and of the internal dynamics of the 
neurons. We have shown that for a simple model, we find a general relation connecting 
the time scales of the population spiking and that of the stimulus observed with the 
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error of an ideal observer. This has imphcations for the optimal tuning width of tuning 
functions of neurons. We believe that different strategies for temporal coding in different 
sensory systems might be traced to similar arguments, as has been done in [H] for the 
limit of slow temporal stimulation. 

The main contribution of the present paper is in providing insight, combined with 
closed form mathematical expressions, for the reconstruction error of stimuli encoded by 
biologically motivated point processes. While precise expressions can be obtained only in 
limiting cases, they provide an essential starting point for a theory of optimal encoding- 
decoding within a biologically relevant framework, and demonstrate the importance of 
adaptive encoding to the neural processing of dynamic stimuli. Note that even in the 
simple case of linear processes observed by linear systems (e.g., [21]), it is in general 
impossible to obtain closed form expressions for the estimation error, and one usually 
resorts to bounds. 

1.1. Structure of the Paper 

In section [2] we present the framework used. We will derive a number of results on 
the Minimum Mean Squared Error (MMSE) for a stimulus reconstruction task with 
a Gaussian process observed through Poisson spike trains. A thorough analysis is 
presented for both Ornstein-Uhlenbeck processes and for smoother, non-Markovian, 
processes. In section |3] we will discuss the application of these results to the study of 
optimal population codes. Namely we show the scaling of the optimal tuning width 
and its corresponding MMSE as a function of the correlation structure of the process 
and the overall firing rate. We finalize by discussing the implications of the presented 
framework to the field and future applications of our framework. 

2. Reconstructing Dynamic Stimuli Based on Point Process Encoding 

The problem of reconstructing dynamic stimuli based on noisy partial observations falls 
within the general field of filtering theory (e.g., [21]). Consider a dynamic stimulus x{t), 
X G M"-, which is observed through a noisy set of sensors leading to output y{t), y G M™. 
For example, x{t) could represent the position and velocity of a point object, and y{t) 
could represent the firing patterns of a set of retinal cells. The objective is to construct 
a filter, based only on the observation process, which provides a good estimator for the 
unknown value of the stimulus. Formally, denoting by t/([0,t]) the set of observations 
from time to the present time, we wish to construct an estimator i(j/([0,t])) which 
is as close as possible (in some precisely defined manner) to x{t). A classic example of 
filtering is the case of a stimulus x{t) generated by a noisy linear dynamical system, 
and an observer y{t) which is based on a noisy linear projection of the stimulus. The 
classic Kalman filter (e.g., [21]) then leads to the optimal reconstruction in the MMSE 
sense. 
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For a filtering task, the use of an Cp norm as a cost function is a natural choice, 
given that we are interested in reconstructing the system's state as precisely as possible. 
Here we choose to use the £2 norm as a cost function for our reconstruction task. This 
is a natural choice for a filtering task (see [21]) and has been frequently used in studies 
of optimal population coding [H El [3]. Another popular cost function for studies of 
optimal population coding is the mutual information between the input distribution 
and the conditional response distribution This allows one to find the code that 
optimally codes the information contained in the input in its response. Though recent 
theoretical advances are sketching out the relationship between information- and MMSE- 
optimal codes [221 ESI El] , there seems to be no simple equivalence between these two 
cost functions. In Gaussian additive channels, the MMSE is equal to the derivative of 
the mutual information between input and output with respect to the signal-to-noise 
ratio. Though similar relationships have been derived for Poisson processes, these hold 
only for linearly modulated inhomogeneous processes and do not relate directly to the 
MMSE [211 EH]- Furthermore, these results have been derived only for single point 
processes, not for populations thereof. We therefore choose to work strictly with the 
£2 cost function, as this is not only a natural choice for the problem at hand, but also 
allows for a number of analytical results. 

We consider the case of linear Gaussian stochastic processes observed by a 
population of neurons with unimodal tuning functions. This is analogous to considering 
stimuli drawn from a Gaussian process prior, as has been done in [9]. We will have a 
prior distribution over the stimulus x{t) given by a Gaussian process with zero mean and 
covariance function K{t,t') = {x{t)x{t')), where the angled brackets denote the average 
over the ensemble of Gaussian processes. 

Rather than considering general Gaussian processes we will focus on a class of 
processes which are particularly amenable to analytic investigation. This will allow us 
to consider both simple Markov Gaussian processes (the so-called Ornstein-Uhlenbeck 
process) and higher order Markov Gaussian processes. We will consider stochastic 
processes described by a stochastic differential equation of the form 



where W{t) is a scalar Wiener process. We can find the covariance of the process by 
calculating the Fourier transform, computing the power spectrum and then reversing 
the Fourier transform. We will have 



This power spectrum leads to stochastic processes with the so-called Matern 
kernel [261 P- 211]. If unobserved, the distribution over x will converge to 
the equilibrium distribution, given by a Gaussian distribution with zero mean 
and covariance = rf processes can also be written as 

multidimensional first-order processes by defining X(t) = (Xi(t), X2(t), . . . ,Xp(t))''^ = 




(1) 




(2) 
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{x{t),x^^\t), . . . ,x^^~^\t)y^ , where x'^'^\t) denotes the i-th derivative of of and its 
associated stochastic differential equation 

dX(t) = -rX(t)dt + ifdW(t), (3) 

where W(t) is a P- dimensional Wiener process. F and H are defined in Appendix A[ 



Note that the process x{t) itself will not in general be Markovian. The process X(t) 
of x{t) along with its first P — 1 derivatives will, however, be Markov. This allows 
us to treat smooth non-Markov dynamics as the marginal case of a higher- dimensional 
Markov stimulus and so to draw from the theory of Markov stochastic processes, which 
is very well-established [27] . 

The spike trains will be modeled by inhomogeneous Poisson processes N^(t), 
namely, Poisson processes whose rates are functions of the stochastic process X(t). 
More precisely, N'^{t) represents the number of times neuron m has spiked since the 
beginning of the experiment. Furthermore, we will assume each spike train N^{t) to be 
conditionally independent of all others, that is, given a value of the stimulus X(t) the 
spiking of each neuron is independent of all others. The function relating the stimulus 
X(t) to the rate of the Poisson process is often referred to as a tuning function, and will 
be denoted by Am(X(t)). We will consider unimodal tuning function of the form 

A^(x(t)) = 0expf-('^(^)"^" 



2a2 

where 6m is the preferred stimulus value of neuron m. Tuning functions of this form 
are often found in orientation-selective cells in visual cortex of mammals and in place 
cells in the hippocampus [281 [291 [30]. Fo^^ multi-dimensional stimuli, we can write these 
more generally as 

A„,(X(t)) = 0exp (-(X(t) - e„)^A+(X(t) - e„)/2) . 

We will prefer this notation as it allows us to derive a general theory which also 
holds for multidimensional stimuli. The case of a one-dimensional P-th order process 
along with its P — 1 derivatives would be recovered by setting Afj = and 
©m = i^m, 0, . . . , 0). While in many cases biological tuning functions are unimodal, we 
have chosen to work with Gaussian functions for reasons on analytic tractability. 
The likelihood of a spike train {A^"*([0, t])} is given by (see [3T] ) 

/:({iV™([0,t])}|X([0,t])) = exp|^ J dN"^it)\og{XUnm-Yl J d^A„(X(t)) 

We have denoted here by {A^™([0,t])} the value of all spiking processes N"^{s) for any 
instant s such that < s < t, and likewise X([0,t]) denotes all values of X(s), for all 
s such that < s < t. With this likelihood we can then find the posterior distribution 
for X([0,t]) using B ayes' rule. 



P(X([0,t])|{Ar™([0,t])}) oc /:({A^™([0,t])}|X([0,t]))P(X([0,t])). 
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Averaging out the values of X(t) up to but excluding the time t, will give us the time- 
dependent Gaussian posterior P(X(t)|{A^'^([0, t])}). If the likelihood were Gaussian in 
X we could use results from conditional distributions on Gaussian measures to average 
over X([0, t)) (for more details, see [9]). This is indeed the case when the tuning functions 
are densely packed, as we will see in the following. 

Let us then turn to the spiking dynamics of the whole population. Because each 
neuron spikes as a Poisson process with rate Am(X(t)), and the processes N^^{t) are 
conditionally independent when conditioned on X(t), the process N{t) = Ylim^^i^) 
also Poisson with rate A(X(t)) = 'Ylim^mO^i^))- neurons' tuning functions are 

dense, however, the sum does not depend on X(t) and therefore the overall firing rate 
of the population does not depend on the stimulus at all. The rate can be estimated 
for equally spaced tuning centers 6m by considering it a Riemann sum [2]. Assuming 
the tuning centers 6m are tiled in a regular lattice with spacing A6 along each axis, we 
have 

V- / . 0K27r)^'^'^^(^)det*(A)l'/' 
A = 0^exp (-(6m -X)^A+(6m -X)/2) ^ ^ ' ^ , 



where det*(yl) is the pseudo-determinant of A defined in Appendix B The assumption 
of dense tuning functions is clearly very strong as the number of neurons necessary to 
cover an n-dimensional stimulus space grows exponentially with n. Note however, that 
the deviation from this approximation is very small when the tuning center spacing is 
of the order of the diagonal elements of A, i.e. when the tuning functions have a strong 
overlap. If this assumption is violated we would have to treat the filtering problem 
through the stochastic partial differential equation for the probability, as was done in 
[6] for general Poisson processes. 

Given the assumption of dense tuning functions and due to the prior assumption 
about x(t), the likelihood becomes Gaussian, and the posterior distribution 
P(X(t)|{A^"*([0, t])}) will also be Gaussian. Since the dynamics of X(t) is linear, the 
Gaussian distribution will be conserved and, in the absence of spikes, the mean /x(t) and 
covariance S(t) = ((X(t) — /i(t))(X(t) — fi(t))~^') will evolve as 

dixit) 



dt 

and 

dS(t) 



-r/i(t), (4a) 

-rs(t) - s(t)r^ + z/^. (45) 



dt 

If a spike from a neuron m occurs at a time t, the posterior distribution of X(t) gets 
updated via Bayes' rule. The prior is then given by A/'(X(t); fi{t), S(t)) and the likelihood 
is A/'(X(t); 6m, A). We denote the mean and covariance immediately after the spike at 
time t by //(t"*") and S(t"'~). By standard Gaussian properties we obtain 

/i(t+) = (s(t)-i + (s(t)"VW + A+ej 

= liit) + (S(t)"i + (6m - /i(t)) (5a) 
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= S(t) + (s(t)-i + a+)"'a+s 



it) 



{5b) 



This fully determines the optimal Bayesian filter, namely, it is given by -Pt(X) = 
A/'(X; /i(t), where the evolution of fi(t) and S(t) is given by ( l4aj) and (l45l) in 

the absence of spikes and by (15 a\) and (IsB whenever there are spikes. It is interesting 
to observe that the dynamics of the mean and variance between spikes, given in (l^oj) 
and (l46l) . is precisely that obtained for the continuous time Kalman filter [21] when 
observations are absent. This is not surprising since between spikes we are tracking a 
linear dynamical system, as does the Kalman filter. 

In order to determine the properties of the optimal encoder, we aim at obtaining 
expressions for the mean-squared error of the optimal filter. This gives us a measure 
of the best-case performance that can be achieved for a signal reconstruction task from 
the observation of the spike train. Specifically, we want a measure of the average 
performance over all stimuli and all spike trains. For that, note that because of the 
Gaussian nature of the stimulus and the linear dynamics of the prior, the evolution of 
the posterior variance does not depend on the spike trains of each neuron, but rather 
only on N{t), the total spike count of the population. The average of the posterior 
covariance matrix is given by 



Note that the diagonal terms give us the mean squared error on the estimation of every 
coordinate of X(t). More specifically, the MMSE in the estimation of x{t) is given by 
Suit). We can simplify this by noting that fi{t) = E(X(t)|{A^"'([0, t])}) and evaluating 
the average over P{X{t)\{N"'{[0,t])}) [2]. We have 



where in the last step we have used the fact that the dynamics of and therefore 
its average, only depend on the population spike count N{t). Up to the last step, 
this derivation is generally valid, though one must take care to consider the marginal 
distribution of the observations, averaging out the signal. Note that, although we can 
give an account of the temporal evolution of the mean squared error this cannot be 
given a biological interpretation in the absence of an experiment to contextualize the 
time dependence of the error. However, in the limit of long spike trains, the MSE 
will converge to its equilibrium value (see figure [1] and figure [3]) and we can study the 
equilibrium statistics of S(t) to determine the average MSE of the reconstruction task. 

We will look at the transition probability for S(t) after marginalization over N{t). 
Consider the probability P{T,',t + dt\T,,t), of the covariance having a value E' at time 
t + dt given that at time t the covariance was S. For infinitesimal dt, S' is either given 
by ( l46j) . or with a probability A dt there is a jump as specified in ( IsB . with A = J2m. -^m- 
This will yield the transition probability 



5(t) = ((x(t)-Mt))(x(t)-Mt)r> 



X,{N"^{lO,t])} ■ 



S{t) — (S(t))|^™([Q^^])| — (S(t))^Qg^^J-) , 



P(S', t + dt\E, t) = (1 - A dt)S (S' - S - dt{H^ - FS - SF^)) 



(6) 
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The evolution of the time-dependent probabihty over the covariance matrices P(S,t) 
will be given by the Chapman- Kolmogorov equation |271 p. 47] 

^^^^ = -V [i?(S)P(S,t)] + AC(S)P ((S-i - -AP(S,t), (7) 

where P(S) = -IS - SF^ + H'^. Note that this gives us the MSE{t) through the 
average of S over the distribution P{Tj,t). The term C(E) arises from the integration 
of the second Dirac delta function in (|6]) and is given by C(E) = 1/| det(J(S))|, where 
J(S) is the Jacobian matrix 



The exact choice of the ordering of the indices is not of importance, as it would only 
account for a change in the sign of the determinant in C(S), which only enters the 
equation through its absolute value. This term is not of great importance, however, as 
it will be cancelled when we calculate the evolution of the averages. 

We can now formalize what we mean by the equilibrium condition mentioned above. 
Note that under the stimulus and noise models proposed the evolution of the distribution 
of the error is given by (171). Given some initial condition for P(E,to), the distribution 
will evolve and eventually it will reach an equilibrium, such that c?tP(S,t) = 0. We 
are interested in this equilibrium regime, as it provides the average performance of the 
optimal filter after all transients from the initial conditions have vanished. 

The evolution of S{t) = (S)^ can be found easily from ([7]). Using 

^ ^dE/(S)P(S,t)= /dS/(S)-^^^^''^ 



dt ^ V ^ V w I . V y 



and integrating by parts we obtain 
d(S), _ 



dt 

where (/(S))^ = f dS /(S)P(S, t). This cannot be treated exactly, though, as the 
nonlinear averages on the right hand side are intractable even for simple cases. Similar 
relations for the evolution of the average mean and variance of a filter were presented 
in [6] as approximations to the true filter (based on discarding higher order moments). 

The evolution of the distribution over covariance matrices is thus determined. 
However, evaluating the averages is intractable even for the case of dense neurons when 
A is independent of X(t). We will therefore look into three different ways of treating 
([7]). First, we can simply simulate the population spiking process N{t) as a Poisson 
process to obtain samples of P(E,t). By averaging over multiple realizations of N{t) 
we can estimate the average value of and by monitoring its evolution we can 
determine when it has relaxed to the equilibrium. The second possibility is to analyze 
(IH]) in a mean-field approximation, i.e. to simply replace all averages of the form (/(S)) 
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with f{{T,)), disregarding all fluctuations around the mean. We would then have the 
dynamics 

^ = -r (S), - (S), T^ + H' + X {{m);' + A"-) (S(t)), . (9) 

A third possibility is to analyze ([7]) directly. This leads to a number of interesting 
results for the one-dimensional Ornstein-Uhlenbeck process, but the generalization to 
higher dimensions is not straightforward. We will proceed by analyzing the case of the 
Ornstein-Uhlenbeck process flrst. Subsequently we will look at the case of smoother 
linear stochastic processes, which are produced by Gaussian processes with Matern 
kernels [26]. The generalization to linear diffusion processes is straightforward. We 
will flnalize this section with a discussion of the prediction error for the optimal fllter 
considered. 



2.1. Filtering the Ornstein-Uhlenbeck Process 

As is well known, the OU process is the only homogeneous Markov Gaussian process in 
one dimension, and is therefore particularly convenient as a starting point for analysis 
[27] . When we consider the OU process described by the stochastic differential equation 

dx{t) = --fx{t)dt + r]dW{t), 

the analysis we presented above is greatly simplified. The evolution of the posterior 
variance s is then given simply by 

^ = -27(s), + r^^-A/^\ . (10) 
dt + s / ^ 

We will denote the one- dimensional variance by s, reserving S for the multidimensional 
covariance matrix. In the case of the OU process we can give a more complete account 
of the distribution of the errors for the filter. For the one- dimensional case ([7]) simplifies 

to 

dP{s,t) d r.^ 2X0/ , ^ f Vr^f 



dt ds 

Clearly P{s,t) = 0,Vs < 0. In the equilibrium we will have 

- ^ [(27. - ,^)Pis)] - A (^) ^ (^) - ms). (12) 

This is a delayed-differential equation with nonlinear delays in s. To see this, note 
that for every s such that < s < a^, we have — s) > s. For s > a^, this 

term becomes meaningless, as the posterior variance will always fall below after the 
observation of a spike. This means, that for any s > 0, the derivative of P{s) is a 
function of P{s) and P(a^s/(a^ — s)). Defining the function j{s) = cP'sjio? -|- s) we 
can define intervals Jq = [j(so),So), I\ = [j^(so), j(so)), . . . with sq = rf/1'-^. We will 
then have that given the solution of P{s) on /„, the solution in In+i is the solution 
of a simple inhomogeneous ordinary differential equation with a continuity condition 
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on j"(so). This can be simply solved through numerical integration schemes. We will 
show in Appendix C that in the equilibrium P{s > 77^/27) = 0. We therefore define the 



boundary conditions as Peqis) = 0, Vs > rf /2^. This will imply that the jump term in 
(fT2|) will be absent whenever the jump originates in the domain s > 77^/27. This is the 
case when s E Iq. For these values of s we arrive at 

A [(27s - v')Pe,{s)] - \Pe,{s) = 0. (13) 

This is solved by 

Pe,is) = C (^^^ - , Selo. (14) 

We can use this solution as a boundary condition to solve f ll2p as a delayed-differential 
equation with variable delays, but an analytical solution is not available for the 
subsequent intervals. In figure [2] we present the numerical solution of (1121) alongside 
with histograms from simulations. Note that the solution in (|T4|) is exact for the interval 
Jo and is therefore in significant agreement with the simulations. In the subsequent 
intervals numerical errors degrade the precision of the solution. Especially when a is 
very large the intervals /„ will get smaller, and the numerical integration of ( !T2|) will 
become less stable. 

This solution can also be obtained in the limit of very small firing rates, that is 
when A <^ 27. We can then assume that between two spikes the variance has already 
relaxed to its free equilibrium value sq. A spike at time ts will then take the variance 
to s' = j(so). The evolution will then be given by 



s{t) = e-2^(*-*^)s' + Soil - e-2^(*-*»)^ 



t-ts = -—log ' 



which can be inverted into 



27 \ So- s' 

We can then write the distribution over s as a distribution over interspike intervals 
T = t — tg. We know the probability distribution over r is given by an exponential with 
coefficient A. Changing the variables, we will have 



P{s) = P{t) 



dr 



ds 



OC e e '. 



Which in turn gives us (fT4l) for the equilibrium distribution of s. Note that this 
derivation only holds in the limit of A <^ 27, while the deduction above for the interval 
[s'. So) holds irregardless of the value of the parameters. The mentioned limit is very 
useful in the multidimensional case, however. 

In the mean-field approximation for the equilibrium condition we will have 



2 



|^-2o..,^-A-^. (15) 
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Figure 1: General setup of the inference task. The upper panel shows the true stimulus 
or signal x{t) in red solid line, the filtered posterior mean in the blue dotted line and 
the shading represents the posterior probability distribution. The lower panel shows the 
evolution of the posterior variance. The red dotted line gives the sample variance of the 
sample in the upper panel, the solid black line shows an average over 100 realizations 
and the blue line gives the evolution of the mean-field equation with the same initial 
conditions. 



This gives a remarkably good account of the equilibrium properties of the system, as is 
show in figure [TJ Surprisingly, the mean-field equation gives a very good account of the 
transient behavior of the error as well as of the equilibrium. 

We can also derive a simple Gaussian approximation for the rescaled inverse 
variance z = rf/ijs). Note that the distribution of the inverse variance at the 
equilibrium is given by 

d ^ 

- — [72(2 - Z)P,,{Z)] + \P,^{Z -^)~ ^Pe,{z) = 0. (16) 

In the limit where r]^ j^o? ^ z we can Taylor expand the second term to second order. 
In this regime we will have very broad tuning functions and each individual spike will 
have little effect. This can be thought of as a diffusion limit. Further linearizing 
the drift term around its equilibrium we will obtain the Van Kampen expansion for 
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a = 1.0, (^ = 1.0 a = 0.4, (t) = 1 .0 




Figure 2: Different treatments of the distribution of tlie variance for different sets 
of parameters of tlie encoder for a first-order OU stimulus. Tlie red line shows the 
histogram obtained through simulations. The blue line shows the numerical integration 
of (1121) using the mentioned boundary condition. The black line shows the Van Kampen 
expansion given in the text. Note that the numerical integration of f|T2l) provides a very 
good account of the simulated histogram, except for very large values of a, when the 
integration becomes unstable. The Van Kampen approximation in turn provides a good 
account for the limit of large firing rates, for large a as well as for large (f). 



the problem [27]. It will lead to a Fokker-Planck equation whose solution is the 
Gaussian distribution P,{z) = A/" (^1 + [1 + X6/^]^^^ , A^^ [47]"^ [1 + A(5/7]~^/^) . By 
writing P{s) = Pzlri"^ /'~fs)ri'^ /'-fs'^ we can then recover the distribution over the variances. 
These results are all compared in figure El 

2.2. Filtering Smooth Processes 

In section 12.11 we considered Markovian Gaussian processes. However, within an 
ecologically oriented setting, it is often more natural to assume smooth priors [9]. 
Since in general smooth priors lead to non-Markovian processes, which are very hard 
to deal with mathematically, we look at a special case of a smooth non-Markovian 
prior. Specifically, we will look at the multidimensional embedding X(t) of the smooth 
processes given by ([3]). We choose to study these processes as they allow us to consider 
smooth stochastic processes with the same set of parameters as were used to consider the 
OU process. Figure |3] shows the general inference framework for higher-order smooth 
processes. 
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Figure 3: Inference task for the P = 2 process. The upper panel shows the true stimulus 
or signal x{t) in red solid line, the filtered posterior mean in the blue dotted line and 
the shading represents the posterior probability distribution. The lower panel shows the 
evolution of the posterior variance. The red dotted line gives the sample variance of the 
sample in the upper panel, the solid black line shows an average over 100 realizations 
and the blue line gives the evolution of the mean-field equation with the same initial 
conditions. 



Taking the observation matrix Aij = Si^iSija"^ with pseudo-inverse A'^ = Si^iSij/a"^, 
the form of A will allow us to somewhat simplify equation ( l56|) and we can write a 
simple mean-field equation for the evolution of (S)^ = We have in the mean-field 

approximation 

m._rm)-W)T^.H^-^^^il^. (17) 

at 1 + 

This equation still describes the average performance of the Bayesian filter remarkably 
well, not only in the equilibrium phase but also in the transient period (see figure [3]). 
Note that the sole contributor to the nonlinearity in the equation above is the first 
column of the covariance matrix. Given Sij solving for the remaining elements of the 
matrix is a matter of linear algebra. 

The argument made in Appendix C cannot be rigorously be made here, as we have 
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S evolving in a high dimensional space with nonlinear jumps. In principle, one could 
map out a subspace such that P(E,t) is zero outside of it, and then proceed to locate 
a set of S such that (S~^ — A^)~^ falls in it, and then seek to solve equation ([7]) in 
that domain. This could, in principle, be made rigorous, but as we will see below, this 
approach is not as useful when considering smoother processes. To see why, let us study 
the small A limit as we did in the OU process. Assuming A <C V ■ -B, we can assume that 
the covariance matrix has nearly relaxed to its equilibrium value So every time there is 
a spike. Whenever there is a spike the covariance matrix jumps to S' = (Sq ^ + A^)~^. 
The free evolution is then given by 



S(t) = e-(*-*«)^S'e-(*-*°)r^ + / dse-(*-^)ri7V(*-^)r' 

J to 

Changing variables to r = t — to we have 

SM = e-^^S'e-^r" + / dse-^^i/V^^" 



Clearly there is no simple solution to these matrix equations as there is for the OU 
case, but we can still use the same approach as before numerically. We can assume an 
unknown function of S that gives us the time since the spike t(E). We can then proceed 
as before and evaluate the marginal distributions. We would have for Si^i 

P(r(Sn)) e-^- 



dr I I dr 



When we analyzed the analog of this relation for the OU process we could find a simple 
correspondence between the divergence of the distribution around the equilibrium value 
of the variance and the parameters A and 7. Though we can argue similarly in this 
case, note that immediately after the jump, we have dSn/dr = T,'i2 = which results 
itself in a divergence in the distribution in this approximation. So, even in the limit 
A ^ V ■ 5, there is always some probability mass concentrated around the value S', so 
the divergence argument is not so strong for smoother processes. This can be seen in 
figure m 



2.3. Prediction Error 

The prediction error can easily be derived as a function of the filtering error in this 
framework. We have the predictive probability P(Xf_|_^|{iV™([0, t])}), with 6 > 0. The 
average prediction error would simply be the deviation of X^^^ from the estimator 
fi{t + 6). The prediction error is then 

Vi6) = ((X(t + 6)- + 5))(X(t + 6)- fiit + 5))''>x,|A..([o,])} • (18) 

This gives us the matrix S{t) when 6 = 0. For 5 > we have the prediction error 
matrix. Note that given a value of X(t) and a realization of the Wiener process dW(s) 
ioT t < s < t + 6, we have X(t + 6) = j'^^ e-^'+^-''>^ HdW{s) + e-^^X{t). Clearly, 
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Figure 4: The small firing rate approximation provides a very good account of the 
distribution of the variance of the observed process Sn. 



conditioning on X(t) the above average is only over the Wiener process between t and 
t + 5. The estimator + is also given by /i + = e~^^ n{t) in the absence of spikes. 

The prediction error matrix will then be given by 

-t+s 



V{5) = U e-(*+'^-^)^i/dW(s) + e-^^X{t) - e-'^^/i(t)) x 

pt+S \ 

( / e-(*+^-")^i/dW(M) + e-^^X(t) - e-^VW)^ ) (19) 

Jt I X,{Ar'"([0,t])} 

bmce e' is non-anticipating and does not depend on X(t) or N"^{t), we have 

that (see 



t+5 



'(t+S- 



/t+d \ pt+d 

(e-(*+^-^)rifdW(s))^ \ = e-(*+^-")^i/2e-(*+^-")^"du 

and therefore, changing variables, 

V{5) = e-^^ViO)e-^^^ + / e-'^H^e-'^^ ds. (20) 

Jo 

This is the usual relation for the evolution of the variance of linear stochastic processes, 
and it shows us that the prediction error is a simple function of the filtering error. This 
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Figure 5: The evolution of the average prediction error V{6) is completely determined 
by the filtering error V{0). The blue line shows the prediction error obtained from the 
optimal filter in simulations, whereas the green line shows the evolution of the prediction 
error according to fl20|) with the initial condition given by the average filtering error 
obtained in the simulations. The small discrepance between both curves is due to finite 
sample size effects. 



is also a consequence of the Markov nature of the posterior probability. Taking a non- 
Markov prior process would result in a posterior probability whose parameters could 
not be described by a set of ordinary differential equations. 

In figure |5] we show the comparison between the theoretical result in ( l20l) and 
simulation results for the prediction error. We can see that the prediction error is very 
well described by the derived equation. 



3. Optimal Population Coding 

We can now apply the results from the previous section to study the optimal coding 
strategies for neurons in the dynamic case. The study of bell-shaped and more 
specifically Gaussian tuning curves has been frequently discussed in the literature 
[321 ISni El E] . Our framework treats the case of densely packed Gaussian tuning functions 
for stochastic dynamic stimuli. We can study the performance of the tuning functions 
by mapping the minimal mean squared error for an encoder given by a certain tuning 
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function. In our case, given the sensory neurons, specified by tlieir preferred stimuli 0^, 
the tuning functions are specified by the maximal firing rate and the tuning width a. 

figure [6] shows a colormap of the MMSE for an Ornstein-Uhlenbeck process for a 
range of values of a and 0. There are some clear trends to be observed. First, we note 
that for any given value of a, increasing the maximal firing rate (f) leads, not surprisingly, 
to a decrease in the MMSE. When the value of (f) tends to zero, the MMSE tends to 
the equilibrium variance of the observed process, which is expected, as in the absence 
of spikes, the best estimate of the stimulus is given by the equilibrium distribution of 
the OU process. A second interesting observation is that for any given fixed value of 
there is a finite value of a which minimizes the MMSE. This is in accord with findings in 
[2], [3], [20] , which also report that the optimal tuning width is a finite value, as well as with 
experimental data [28l [33]. This can be intuitively understood by noting that when we 
decrease the tuning width keeping constant, we decrease the population's firing rate A, 
and eventually, for very sharp tuning functions, we will have a vanishingly small number 
of spikes. On the other hand, increasing the width a will lead to a gain in information 
through more spikes but eventually these spikes become so uninformative that there is 
no more advantage from increasing a, as these barely give us more information than is 
already present in the equilibrium distribution. The right panel of figure [6] shows the 
MMSE clS cL function of a for some values of (h. 

The situation is very similar when we consider higher-order processes. Although 
the decay of the posterior covariance to its equilibrium value follows a more convoluted 
dynamics, the argument relating a and the improvement in the reconstruction still holds. 
Figure [7] shows the analog of figure [6] for the process with P = 2. We can see that the 
general structure of the color plot is very similar. One important difference though, is 
that for the same firing rate, we are able to obtain a larger improvement in the MMSE, 
relative to the equilibrium variance of the process. This is clear in the right panel of 
figure [71 This is to be expected, as the higher-order processes have more continuity 
constraints and are therefore more predictable. 

3.1. Ecological Analysis of the Optimal Code 

We will now look at how the optimal value of a depends on the parameters of the process. 
We can, fixing 0, 7 and 77 find the optimal value of a that minimizes the MMSE. This will 
give us a mapping of how the optimal tuning width depends on the different parameters 
of the stimulus. We consider the case of the second-order Ornstein-Uhlenbeck process 
(P = 2). Note that changing 7 also influences the signal variance, as the equilibrium 
variance of the stimulus is given by = rj'^/A'-f^. To better separate these influences 
of the timescale and variance of the stimulus, we chose to look at the process given by 
an OU process with diffusion coefficient rj' = rj'j'^^'^ resulting in a equilibrium variance 
of = Tf/A. In that way we can better analyze the influence of the timescale and 
intensity of the covariance of the stimulus. 

This analysis is shown in figure [8] for the second-order OU process. The timescale 
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Figure 6: MMSE landscape as a function of the encoder's tuning width a and maximal 
firing rate for the one-dimensional OU process. For simplicity we show the case where 
7 = ?7 = 1. When either parameters are very near 0, the MMSE is equal to the stimulus' 
equilibrium variance. For fixed values of we observe the existence of a finite optimal 
tuning width, as is shown in the right panel. 

of the process strongly influences the optimal tuning width as can be seen in the figure. 
With an increase in 7 the optimal tuning width increases. This is in accord with the 
finding in [3], where it was reported that for static stimuli a larger decoding window 
leads to a narrower optimal tuning width. This can be cast into our framework by 
making an analogy between the decoding time window and the characteristic time of 
the decay of the autocorrelation of the observed process. A very large correlation time 
means a very slowly changing process, which needs less spikes and therefore allows for 
narrower tuning widths. This can be also understood by referring to the solution for the 
error distribution of the OU process given by f|T^ . Smaller values of 7 allow for a good 
reconstruction of the stimulus with less spikes. An increase in the stimulus variance 
(T^ leads to an increase in the optimal tuning width. Meanwhile, an increase in the 
maximal firing width of each neuron results in a decrease of the optimal tuning width, 
as is expected. Increasing the height of each tuning function allows the tuning functions 
to sharpen without decreasing the overall firing rate of the population. 
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Figure 7: MMSE landscape as a function of the encoder's tuning width a and maximal 
firing rate cf) for the second-order OU process. Again, we show the case where 7 = 77 = 1. 
When either parameters are very near 0, the MMSE is equal to the stimulus' equilibrium 
variance. For fixed values of we observe the existence of a finite optimal tuning width, 
as is shown in the right panel. 

An interesting perspective on the coding- decoding problem we are studying can be 
obtained from rate-distortion theory [M], where one seeks the optimal tradeoff between 
coding rate and reconstruction error (distortion). The celebrated rate-distortion curve 
provides the smallest rate for which a given level of distortion can be achieved (usually 
asymptotically). We study the analog of a rate-distortion curve in our case. The rate is 
given by the average population firing rate A while the distortion would be the minimal 
mean squared error. In figure [9] we show a plot of the MMSE of the optimal encoder 
against the firing rate of the optimal code for given values of 0, 7 and 77. Interestingly, 
the rate-distortion curve is independent of the value of 77. The dependence in 7 is 
as expected. For smaller values of 7 (i.e., for longer correlation times), the error 
decays faster with the population firing rate. Meanwhile, larger values of 7 (i.e., shorter 
correlation times), lead to a slower decay in the distortion as a function of the firing 
rate. 
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a b c 




Figure 8: Dependence of the optimal tuning width on the parameters of the stimulus 
and encoder for the P = 2 process. Panel a shows the optimal tuning width as a 
function of the forcing parameter 7 which is the inverse of the characteristic timescale 
of the stimulus for three different values of the maximal firing rate (j). Panel b shows the 
optimal tuning width as a function of the equilibrium standard deviation of the stimulus 
for the same values of (j). Panel c shows the optimal tuning width as a function of the 
maximal firing rate of the encoding neurons for three values of the inverse time constant 

7- 

3.2. Comparison to Previous Research 

The existence of an optimal finite tuning width for unimodal tuning curves has been 
repeatedly determined in the literature. This has been found to be the case with the 
mutual information as objective function [7] as well as with the reconstruction error as an 
objective function [201 El El [35]. Our findings on the optimal tuning width as a function 
of the correlation time of the observed process, summarized in figure [H generalize the 
findings in [3] and in [2] for dynamic processes. Namely, it was established that longer 
integration times lead to narrower optimal tuning widths. In our case, longer integration 
times correspond to longer correlation times and therefore smaller values of 7. We have 
found that for smaller values of 7, i.e. for longer correlation times, the optimal tuning 
width decreases, as would be expected. Also, as was found in [2], we have established 
that noisier prior processes lead to broader optimal tuning widths (see figure [H]). These 
results seem to hold in a fairly general set of conditions and this indicates that the 
tradeoff between firing rate and accuracy is central to the coding strategies in the nervous 
system. 
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Figure 9: Rate distortion curve of the optimal encoding scheme for the second-order 
OU process. 



4. Discussion 

The temporal aspect of neural coding has often been neglected in studies of optimal 
population coding. A framework for the use of filtering in neural coding had been 
proposed in [9], but even there the offline paradigm was favored, and most studies 
either bypass the issue of time completely or resort to time-binning to address the 
temporal aspects [31 136] . We have extended the filtering theory presented in [H [2] for 
finite-state Markov processes and static processes, where the whole spike train is used 
for decoding and, using a model of the dynamics of the stimulus, we have addressed the 
decay of the information as well. 

Our framework generalizes a number of findings from previous studies. The finding 
from [3] and [2] that the reconstruction error decays with the length of the decoding 
time-window is here framed in terms of the correlation time of the stimulus. We also find 
the same relation between the optimal tuning width for bell-shaped tuning functions 
and the correlation time as had been found for the tuning width as a function of the 
decoding time. 

Another advantage of our treatment is that it allows us to study more complex 
aspects of temporal correlation than just the length of the decoding window. By 
considering a class of higher-order stochastic processes, we can address different temporal 
correlation structures. This had been also proposed in [9] , but not explored any deeper. 
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In that spirit, we have estabhshed that smoother processes allow for a more efficient 
coding strategy with the same ffiing rate in the sensory layer. The complexity of the 
process results in a higher cost in the decoding of the spike train, however, implying that 
although smooth processes allow for very efficient reconstruction strategies, they will 
require a more complex decoding strategy. In [37], a decoding strategy using recoding 
of spikes for RBF processes had been suggested. The approach taken in |T] is to encode 
the dynamics of the process explicitly into the recurrent connections of the decoding 
network, which would be straightforward to generalize for smooth processes, but would 
imply that the number of decoding neurons scales with the order of the process. Clearly, 
biological systems are not concerned with the degree of continuity, and the model of the 
temporal structure of the world should be adapted to the (biological and evolutionary) 
experience of the natural environment. This does, however, suggest a tradeoff between 
the effort spent in optimally coding a stimulus and the effort spent in decoding it. 

Our approach allows for a more flexible and structured way of dealing with the 
temporal aspects in the neural code. The generalization of the filtering scheme for 
more complex neuron models such as generalized linear models is relatively simple, 
although the assumption of dense tuning functions clearly cannot hold then and the 
solution of the filtering problem will not be Gaussian anymore. One can still consider 
a Gaussian approximation to it, as has been done in [38], or resort to particle filter 
approximations [39]. One other interesting direction for further research is to develop 
filtering approaches to biologically inspired neuron models, such as leaky-integrate-and- 
fire models. The hurdles in this case are the same, the Gaussian assumption will not hold 
and we have to work with approximations. However, through a systematic treatment of 
the resulting problems, more insights might be gained into the temporal aspects of the 
neural code. 
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Appendix A. Definition of T and H 

To obtain the process defined in ([1]) we define 

Appendix B. Definition of the Pseudo-determinant 

The pseudo-determinant of a square n x n matrix A is given by 

det{A + al) 



det*(v4) = lim 



ryn—rank{A) 



For positive semi-definite matrices as are used in the text, the pseudo-determinant is 
the product of all non-zero eigenvalues. 
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Appendix C. Boundary Conditions for the Differential 
Chapman-Kolmogorov Equation 

We want to show that in the equihbrium P{s)eq = 0, Vs > 7f /2s. We will proceed by 
cases. If < rf /'2,'^, we have that the jump term will be absent of (fTTj) for s > rf /2s 
and we will have 



with some initial condition P*{s,to) = Po{s), P{s,t) = e~^^^~^°'' P*{s,t) is a solution of 
the first equation with the same initial condition. Therefore P{s,t) < P*{s,t), t > t^. 
But P*{s,t) is the solution of the Liouville equation for the system s = —2^s + rj"^. 
Namely, as t -> cx), P*{s,t) -)■ 6{s - rf/2^). Therefore, Vs > 77^/27, P(s,t) 0, as 
t — )■ 00. 

If > 77^/27, we first proceed in the same manner, but taking s > and imposing 
an absorbing boundary condition at s = a^. Thus we show that P{s,t)eq = 0, Vs > a^. 
We can then subsequently apply the same argument for the intervals 
as long as j^{a^) > rj^ /2'-). Finally, we use the same argument for [77^/27, /'"(a^)], where 
m is the highest integer such that > 77^/27. This shows our desired result. 



dP{s,t) _ d 
dt ds 



{{2^s-ri^)P{s,t))-XP{s,t). 



It is easy to see that given a solution of equation 



dP*{s,t) _ d_ 
dt ds 



{{2^s-7f)P*{s,t)) 



